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ABSTRACT 

In this paper we provide a basic introduction of the core 
ideas and theories surrounding fault-tolerant quantum com¬ 
putation. These concepts underly the theoretical frame¬ 
work of large-scale quantum computation and communica¬ 
tions and are the driving force for many recent experimental 
efforts to construct small to medium sized arrays of con¬ 
trollable quantum bits. We examine the basic principals 
of redundant quantum encoding, required to protect quan¬ 
tum bits from errors generated from both imprecise con¬ 
trol and environmental interactions and then examine the 
principals of fault-tolerance from largely a classical frame¬ 
work. As quantum fault-tolerance essentially is avoiding the 
uncontrollable cascade of errors caused by the interaction 
of quantum-bits, these concepts can be directly mapped to 
quantum information. 

Categories and Subject Descriptors 

H. 4 [Information Systems Applications]: Miscellaneous; 
D.2.8 [Software Engineering]: Metrics— complexity mea¬ 
sures, performance measures 

General Terms 

Quantum Information, Quantum Error Correction 

Keywords 

ACM proceedings, ETI^X, text tagging 

I. INTRODUCTION 

Fault-tolerant, error corrected, digital quantum computing 
underpins a significant worldwide effort to construct viable, 
commercial quantum computing systems [?]• The size of 
such error corrected machines is somewhat daunting for a 
field that has only managed to experimentally fabricate ar¬ 
rays of up to about ten functional quantum-bits (qubits) 
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HE]. However, the theoretical framework for fault-tolerant 
quantum computing has existed for nearly 20 years and is 
very well understood and quantum computing is competing 
with the vast classical computing power currently in exis¬ 
tence. It would be unreasonable to believe (even given the 
apparent computational power quantum information pro¬ 
cessing has over classical computing), that a small, error 
prone array of qubits could computationally outperform a 
classical system comprising of potentially millions of com¬ 
puting cores, each itself containing billions (or even trillions) 
of transistors. 

Fault-tolerant quantum computing refers to the framework 
of ideas that allow qubits to be protected from quantum 
errors introduced by poor control or environmental interac¬ 
tions (Quantum Error Correction, QEC) and the appropri¬ 
ate design of quantum circuits to implement both QEC and 
encoded logic operations in a way to avoid these errors cas¬ 
cading through quantum circuits El • By avoiding a cascade 
of errors, there becomes a point (when the fundamental ac¬ 
curacy of individual qubits is high enough), where QEC is 
correcting more errors than are being created. Once this 
threshold has been achieved, expanding the size of the pro¬ 
tective quantum will exponentially decrease the failure of 
the encoded information and allows us to achieve arbitrarily 
long quantum algorithms implemented with noisy devices. 

In this paper we will provide a basic introduction to some 
of the key principals of QEC and then pivot into a discus¬ 
sion about fault-tolerance that have been investigated in the 
classical computing world. As the goal of fault-tolerance is 
to prevent errors to cascade uncontrollably, a large amount 
of classical work can be easily transferred to the quantum 
world. 

2. QUANTUM COMPUTING 

In this section we present the basic mathematical framework 
for qubits, quantum gate operations and quantum circuits. 
Further details can be found in a number of papers [2] and 
books El both from the physics community and the com¬ 
puter science community. 

Quantum circuits represent and manipulate information in 
qubits. A single qubit has an associated quantum state \ip) = 
(aojCri)^ = Qo |0)-I-ai [1). Here, [0) = (1,0)^ and [1) = 
(0,1)^ are quantum analogous of classical logic values 0 and 
1, respectively, ao and qi are complex numbers called am¬ 
plitudes with [ao]^ -I- [ai]^ = 1. 



Quantum measurement is defined with respect to a basis 
and yields one of the basis vectors with a probability re¬ 
lated to the amplitudes of the quantum state. Common 
measurements are known as Z- and ^-measurements. Z- 
measurement is dehned with respect to basis (|0) , |1)). Ap¬ 
plying a ^-measurement to a qubit in state |i/’} = <ao |0) -I- 
ai |1) yields |0) with probability |aoP and |1) with prob¬ 
ability |aip. Moreover, the state |i/’) collapses into the 
measured state (i.e. only the components of \ip) consistent 
with the measurement result remains). A-measurement is 
dehned with respect to the basis (|-|-} , |—)), where |±} = 

^(|o)±|i)). 

A state may be modihed by applying single-qubit quantum 
gates. Each quantum gate corresponds to a complex unitary 
matrix, and gate function is given by multiplying that ma¬ 
trix with the quantum state. The application of X gate to a 
state results in a bit flip: A(q:o,q:i)^ = (ai,Q:o)^. The ap¬ 
plication of the Z gate results in a phase flip: Z{aQ, oi)^ = 
(ao,—<ai)^. A y = iXZ gate can be thought of as both 
a bit-hip and a phase-hip together on an individual qubit. 
These three gates are important in the context of quantum 
errors. 

The exponentiation of the Pauli matrices results in the ro¬ 
tational gates Rx, Ry, Rz parameterised by the angle of the 
rotation [8]. Hence the bit hip is a rotation by tt around 
the A-axis, implying that A = Rx{tt), and the phase-hip is 
a rotation by tt around the A-axis, such that Z = Rzij^). 
The Hadamard gate is H — Rz{tt/2)Rj;{tt/2)Rz{tt/2) and 
can be used to take a computational state {ketO, |1)} into 
superposition states, {|±) = (|0) ± |l))/%/2, a state with no 
classical analogue 

In the context of fault-tolerant quantum computation, only 
one interaction gate is needed; the controlled-not (CNOT) 
gate. The CNOT gate is the quantum analogue of a binary 
XOR operation and is designed to bit-hip the state of a 
target qubit, conditional on the state of a control qubit. This 
gate can be employed on certain quantum states to prepare 
entangled states. For example, CNOT(|0) -I- |1)) |0) /\/2 = 
(|00) -I- |ll))/%/2. This state, known as a Bell state has 
the property that measuring one of the qubits produces a 
random result (|0) or |1) with a 50:50 probability), but once 
the state of one qubit is measured, the state of the other 
qubit is also determined. 

It is well known that the ability to perform arbitrary rota¬ 
tions around two orthogonal axis (e.g. Rz{di) and Rx{ 02 ) 
for arbitrary { 61 , 02 }) and to couple arbitrary pairs of qubits 
with a CNOT gate is sufficient to realise any A-qubit unitary 
operation. This gate set is therefore quantum universal. 

3. ERRORS AND QUANTUM ERROR COR¬ 
RECTING CODES 

There are two important differences between classical error 
correction and quantum error correction. The first is the 
no-cloning theorem m, which states that is is impossible 
to perfectly copy an unknown quantum state, i.e. there is no 
operation that satisfies U lip) \0) = \'tp)\'tp) for an unknown 
IV'}. Therefore, we are unable to protect arbitrary quan¬ 
tum states against errors by simply making multiple copies. 


Secondly, any measurement of an arbitrary quantum state 
will collapse the wavefunction describing the state. Hence 
protecting errors in an encoded piece of quantum informa¬ 
tion by measuring a certain subset of the encoded block will 
irrevocably destroy the information content of that state. 
Therefore we need a slightly new mechanism to protect en¬ 
coded quantum information. 

The foundation of quantum error correction is still based on 
classical coding theory, however we need to design codes in 
a slightly different manner. This is due both to the restric¬ 
tions of what we can theoretically do with quantum infor¬ 
mation, but also due to the possible errors that can affect 
qubits. Unlike classical bits, which can only experience a flip 
between |0) -H- |1), qubits can also experience phase errors 
(|0) -I- \l))/\/2 (|0) — \l))/\/2. Additionally, errors do not 

occur in a discrete manner. They are most often continuous 
errors, such as a rotation around the A axis by some an¬ 
gle e or some incoherent (non-unitary) error caused by the 
interaction with the outside environment. 

Due to expedience we will only present the formalism for 
coherent errors, those that can be represented by a uni¬ 
tary gate. In this case, an error operator, E, acting on 
a qubit, j'l/’) can be decomposed into a linear superposi¬ 
tion of A gates, Z gates and both Y = iXZ, Blip) = 
ai |V')-l-a 2 A \'ip)+a 3 Z \%p) +ia 4 X Z lip). If we could magically 
measure if an X and/or Z error occurred on a qubits (via 
some type of quantum measurement), the state would col¬ 
lapse to the state {|i/’), A \ip) , Z \'ip) , XZ \ip)} with a prob¬ 
ability of {|aip, |a 2 p, losl^, |a 4 p}. This converts a possible 
continuous quantum error into a discrete A and/or Z gate. 
While the errors themselves are continuous (for very small 
errors, |aip ~ 1), this determination of what type of error 
has occurred converts small errors into discrete bit- or phase- 
errors with small amplitudes converted to small probabilities 
for such results to be observed. The question is, how do we 
detect if some type of discrete error has occurred? 

This detection occurs through the idea of redundant encod¬ 
ing with two classical codes. One is designed to detect A- 
errors and one is designed to detect A-errors without having 
to, necessarily, decode the codespace. Detecting an error 
indirectly for bit-flips is commonplace in classical computer 
science and was usurped for the quantum regime. The sim¬ 
plest example is the bit-flip code, with basis states given 
by |0)^ = |0)®^ and |1) = |1)®^, where the notation 
simply meaning N copies of the qubit. The basic idea is 
that given a given codihed, N, the number of physical flips 
needed to turn |0)^ -H- |1)^ scales linearly with N. Again, 
in the quantum regime, we are not allowed to directly mea¬ 
sure any subset of qubits in the code block. So we need a 
different method to identify errors. In the context of the bit 
flip code, we notice a certain property, namely that for both 
basis states, pairwise bit-parity in the code block is even (i.e. 
calculating the parity of any two bits via modulo addition 
for the |0)^ and |1)^ state is even). If such a comparison 
ever results in an odd value, we know an error has occurred 
without actually knowing if we started with the |0)^ or |1)^ 
state. This is what we need. Therefore, we need a way 
to calculate the parity of any two qubits in the code block 
without directly measuring the qubits themselves. We can 
do this via the circuit in Fig. ??. This circuit introduces 


an ancilla qubit that is initialised, interacted with a pair of 
qubits in the code block and measured. The result of the 
measurement on the ancilla (either |0) or |1)) will determine 
the parity of the two qubits (odd or even), and also force 
these two qubits to be in an even or odd parity state if they 
were not beforehand. 

The principal of a codespace within quantum computation 
is to construct encoded codewords that are always in cer¬ 
tain, well defined parity states regardless of the state of 
the encoded information. Physical errors will then perturb 
encoded information away from these well defined parities 
which can be detected without determining any information 
regarding the encoding. 

Returning back to the example of a redundancy code, the 
two encoded states |0)^ and |1)^ are constructed to be even 
parity states of any pairwise Z operators, i.e. applying 
the operator ZiZj for any i,j £ N returns the same state, 
ZiZj |0,1)^ = |0,1)^. Bit flip errors result in states which 
violate this condition. For example, a bit-flip on qubit one 
of the encoded block will result in ZiZj |0,1)^ = — |0,1)^, 
Vj. Hence if we measure the parity of any of these operators 
and we find an odd result, we know that some type of error 
has occurred. Determining a location for the error and how 
many unique errors we can identify depends on the size of 
the code block, N. The parity of pairwise checks of the 
ZiZi+i operator will allow us to uniquely locate individual 
errors and the number of errors we can successfully correct 
scales linearly with the number of qubits in the code block. 
For N qubits, we are able to uniquely correct {N — l)/2 
errors. 


This example illustrates how we handle bit-flip errors within 
quantum information, what about phase errors? In quantum 
information, phase-flip errors work exactly the same way as 
bit-flip errors if we take our compuational states as |±) = 
(|0) ± |l))/y2. i.e. a ^ -error will take |-|-) |—). Hence 

if we used a redundancy code of the form |0)^ = |-|-}®^ 
and |1)^ = |—and instead of checking the parity of 
the ZiZi+i operator, we check the parities of the XiXi+i 
operator for i, € {N — 1), then everything works exactly 
the same way. Therefore, a redundancy code either using 
|0,1) or |-|-, —} states will allow us to either protect encoded 
information against X-errors or Z-errors. A full quantum 
error correction code therefore combines two classical codes, 
independently responsible for bit-errors and phase-errors. 


The Shor code is the simplest example of this [9]. In the 
Shor code we essentially have one redundancy code embed¬ 
ded within another. The code encodes a single qubit of 
information into nine physical qubits. The basis states are 
given by. 
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|o)^ = r^(iooo) +1111))(|000) +1111))(|000) + |iii)) 


|l>n = ^(1000) - |111))(|000) - |111))(|000) - |111)) 
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bit flip error in any one of the three blocks. In principal, 
this code can correct for three bit-flip errors (provided each 
error occurs in a separate block), however in general, as 
two or more errors can occur in a single block the code is 
described as only having the capacity for correcting a single 
bit flip error. Phase errors are corrected via the three blocks 
and comparing the parity of pairwise blocks. Again, the 
code, in principal, can correct for more than one phase error 
(provided errors occur in specific locations), but in general 
only a single arbitrary error is deterministically correctable. 

4. FAULT-TOLERANCE 

The effectiveness of QEC depends on how we implement 
quantum circuits to realise the correction code. As with 
classical computing, interactions between qubits during the 
execution of the circuit leads to the copying of errors. If 
this happens in an uncontrolled manner, the QEC code is 
overwhelmed and the computation will fail. Therefore, we 
need to be very careful when designing circuits such that this 
does not occur. While there is not often a direct compari¬ 
son between classical and quantum information processing, 
the principals of fault-tolerance in a classical framework is 
easily transferable. We will therefore focus on some general 
principals of fault-tolerance in the classical world which can 
be mapped directly to quantum computing. 

5. SHORT PARALLEL TO DISTRIBUTED 
SYSTEMS 

The need for fault-tolerance in quantum computing can be 
introduced by drawing parallels with distributed systems. 
The following insights do not extend classical distributed 
systems to quantum ones [10], but propose the problem 
of quantum fault-tolerance to be formulated without intro¬ 
ducing quantum information and quantum computing. The 
discussion will focus on processes and communication over 
point-to-point links. 

The distributed system is modelled as crash-stop [I], such 
that the processes can crash and never return to live, there 
is at least one fault detector module in the system and the 
communication links are perfect (messages are not lost, du¬ 
plicated or inserted by fault). Additionally, the distributed 
system includes a fault corrector, which is informed by the 
fault detectors about faults requiring correction. 

The main target of fault-tolerance, as presented in these 
sections, is to control the propagation of faults between pro¬ 
cesses. The presentation will focus on describing the dis¬ 
tributed system elements, and the way these interact, but 
without delving into details about the liveness and safety 
properties of the presented protocols. 

5.1 The processes 

A process holds an abstract object q and a black-box that 
consists of a coin, two Boolean values and a real value r. 
The black-box is a model for the process faults. The Boolean 
values b and p are the results of tossing the coin twice. The 
value true indicates heads, and false stands for tails. 


We have three blocks of three qubits that effectively act as a 
distance three redundancy code to correct bit flips in the way 
that we described above. This allows us to correct a single 


None of the processes is aware of the Boolean values in¬ 
side the box. Furthermore, the processes do not control 
the black-box and the coin toss. The coin toss is randomly 


performed (with a probability r), meaning that none, a sin¬ 
gle or both values are generated at undetermined execution 
points of the process. Therefore, there may be none or mul¬ 
tiple coin tosses during the lifetime of a process. In order to 
simplify the modelling, the probability r is equal for all the 
processes, and process failures are not correlated. 

The actual state of a process is computed by the function 
qa = q + b + p, where the -I- operation is the addition of the 
two possible random faults modelled as Boolean values A 
correct process is the one for which qa = q, while, if qa yf q, 
the process is called faulty. In the following, q will refer both 
to the object and to its state, depending on the context it is 
used in. It should be noted that object q is noncopyabl43) 
meaning that once constructed, it can be either transformed 
locally or through distributed computations, but cannot be 
copied between processes. 

During process initialisation it possible to initialise the black¬ 
box values and to set the state of g to a state chosen from a 
discrete set sq € S. Once the black-box was set up, there is 
no guarantee that the values had not been changed by the 
coin tosses. 

A terminating process returns a state so € S. During ter¬ 
mination, the current q is classihed against all the 5-states, 
and the closest so is returned. The classification is proba¬ 
bilistic, meaning that if there is no state so for which so = q, 
then it could happen that the returned state so is actually 
orthogonal to the state so' that was classified as next to q. 

Once a process is terminated, it cannot be brought back to 
live and, if one would like to approximate the probability of 
q being in any of the states of 5, the complete distributed 
computation has to be repeated. More flexible initialisation 
and termination procedures could benefit from a larger set 
5, but this will negatively impact the practicality of system 
implementation. 

5.2 Communication and operations 

The Boolean black-box values of a process can be either 
read and communicated to other processes, or updated by 
a second process. A single process is not able to read its 
values and correct them. Value correction is performed 
only through coordinated communication during a proce¬ 
dure similar to consensus [I]. 

The processes are naive: if one of their Boolean values is 
considered heads, then the same happens to the other pro¬ 
cesses. The processes are also lazy, meaning that the com¬ 
munication of heads-values has an associated cost, such that 
cost(heads)=l and cost(tails)=0. 

Ideally, every process should have its Boolean values always 
set to false (tails). For this reason, in general, the pro¬ 
cess avoids initialising their black-boxes with heads values. 
Processes are not byzantine, and each time one communi¬ 
cates, it will try, based on the process values, to convince its 
partener to either flip or keep one of its values. More specif¬ 
ically, a communication step between two processes x and y 

^Similarly, in C-|—I- a private copy constructor and copy as¬ 
signment operator are required for such classes. 


is performed in two rounds: firstly, x sends its b value to y 
requesting it to update its b value to b^ © fey. Afterwards, 
process y sends its p value to x asking an update of its v 
value to Vx (BVy. The © function models the behaviour of a 
coin flip: the coin returns to its initial value after two flips. 
During the third communication round the control process 
applies the distributed abstract operation e: the update of 
the state qy (target) is a function of state q^ (control). The 
e-operation will not be detailed in the context of the analogy 
with distributed systems, and its cost is considered zero. 

Besides communicating and performing the distributed op¬ 
eration e, the process can perform local (intra-process) op¬ 
erations that transform the state of the local object q. 

5.3 Distributed processing 

As previously introduced, inter-process comunication is both 
an attempt to correct the black-box values and, at the same 
time, a distributed computation. The simplest distributed 
system executing a single communication step consists of 
two processes, which we will call the control and the tar¬ 
get. The control initiates the communication, thus is the 
requester during the first communication round, and the 
target is the requester of the second communication round. 
Additionally, a process can be both control and target dur¬ 
ing separate communication steps. 

Generally, a distributed algorithm represens a series of inter¬ 
process communication steps and local operations. There 
are at least two types of algorithms: 1) distributed cor¬ 
rection, where processes communicate only to correct their 
Boolean values; 2) distributed computation, in which pro¬ 
cesses try to solve a computational problem. Algorithms 
compliant with the first option generally consist only of com¬ 
munication steps and no local operations. A well-defined 
correction protocol (see Section [5.511 is the execution of co¬ 
ordinated communication. Distributed computations (the 
second option) include intra-process operations but neglect 
(do not coordinate) the effect of the two correction rounds in 
each communication step. The result is that uncoordinated 
correction can lead to propagation of faults', the heads values 
are being transferred, without the processes having noticed, 
from a faulty process to a correct one (see Section (5.711 . 

5.4 Fault-tolerant processes 

In general, fault-tolerance is achieved by using an hierar¬ 
chic (layered) approach. Assuming the failure probability r 
of a process, a set of two processes will fail simultaneously 
with probability . The introduction of redundancies is the 
key of achieving fault-tolerance, and there are two types of 
possible redundancies: 1) computational redundancy, where 
the same computation is repeated sequentially for multiple 
times; 2) resource redundancy, where multiple processes are 
abstracted as a single logical process and the component 
processes are executed in parallel. 

Computational redundancy is the equivalent to executing 
a distributed algorithm in epochs, and to guaranteeing that 
after a certain number of epochs a property of the algorithm 
is achieved. Resource redundancies are generally used when 
at least / faulty processes are needed to be tolerated. For 
example, the uniform epoch consensus algorithm from [I] 
requires N processes with N > 2f. 



Majorities (quorums m) are the most common option for 
checking the introduced redundancies. The simple major¬ 
ity (N/2 + 1) of objects (processes, bits etc.) is used to 
introduce the fault-tolerant quantum computing in the fol¬ 
lowing: a fault-tolerant logical process is constructed from 
three (or more) component processes (called components), 
and the logical process is able to tolerate at most one faulty 
component. The computation of quorums is detailed in Sec¬ 
tion [TH 

Due to the fact that the q objects of each process are non- 
copyable increases the difficulty of implementing fault-tolerance 
through redundancy. Copying the q state of an existing 
process to a newly initialised one is not possible. As a re¬ 
sult, separate components are initialised into the same state 
g € 5 at the start of the distributed algorithm and exactly 
the same operations are applied on their objects. 

This work presents, without loss of generality, how the con¬ 
struction of logical processes is performed using triple-modular 
redundancy (TMR) [ 6 ]. The logical state qi of a logical pro¬ 
cess is a sequence of n (in this work n = 3) component 
process states: qi = nr=o 1 ^- Transforming qi represents the 
transformation of each qi. 

The repetition code is the TMR counterpart in the field of 
error-checking and -correction methods. As a consequence, 
the logical state qi should be interpreted as the encoding 
of one of the component process states (the states of the 
components are equivalent). The repetition code can be 
replaced with more powerful codes like the Hamming code 
or surface codes [^, but this aspect is not further addressed 
in this work. 

After constructing a logical process from three freshly ini¬ 
tialised processes, it is possible to compute the simple ma¬ 
jority of the fo-values from every component’s black-box. 
Additionaly, after grouping three separate logical processes 
(lower-level) into another logical process (higher-level), it 
is also possible to keep track of p-value majorities. Corre¬ 
spondingly, the highest-level logical process consists of nine 
components (lowest-level) grouped into three logical pro¬ 
cesses. There will be three fe-value majorities and one p- 
value majority. This hierarchic construction where logical 
constructs are embedded into one another is known as con¬ 
catenation, and has the advantage of polynomially lowering 
the failure probability of the resulting logical processes [ 8 ]. 

5.5 Fault detectors 

The fault detectors used in the described distributed system 
are detecting faulty component processes. For each logical 
process there is a separate associated fault detector that in¬ 
teracts with the components. A fault detector consists of 
a set of low-level processes (called ancillae) which are ini¬ 
tialised, used for communicating with the component pro¬ 
cesses and terminated. The output state of the ancillae is 
used to compute a syndrome: infer which component pro¬ 
cess is faulty. A fault detector contains also two variables: 
the Boolean faulty indicates if the associated logical process 
is faulty or not, and the integer pos points to the faulty 
component. 

Fault detectors can check either fe-values (the components 


are controls and the ancillae targets) or p-values (the other 
way around, the components are targets and the ancillae 
controls). Once more, without loss of generality, the follow¬ 
ing fault detectors will be responsible only for 6 -values. 

Ancillae are usual processes and can be affected by faults, 
which are required not to propagate to the components. At 
the same time, as process failures are probabilistic, a set of 
ancillae is used by the detector for reaching (with high prob¬ 
ability) a trustful decision about the logical process (see Sec- 
tion lS.SH . A 6 -value detector responsible for a logical process 
protected against 6 -value faults has the process components 
as controls and the ancilla as target during the communica¬ 
tion steps. If the ancilla holds a p-value set to heads, this will 
propagate to the components, but it would not influence the 
6 -value protection. Again, if the ancilla holds a 6 -value fault, 
this will not propagate given the communication protocol. 

A logical process could be faulty beyond correction when 
a majority of the components is faulty. Assuming that all 
the communication between the logical processes is transver¬ 
sal (see Section [5.711 . and because process faults are uncor¬ 
related, it would be improbable that such processes exist. 
Their existence would be a result of a high r (called in the 
quantum computing literature the error threshold), but can 
be mitigated by using more powerful encodings (e.g. sur¬ 
face code). Therefore, it is further assumed that the right 
encoding was chosen for the logical process states, and that 
faulty components form a minority. 

Computing a simple majority of correct processes from a set 
of components is equivalent to finding the faulty components 
forming a minority. For the distributed system example, a 
minority consists of at most one component. Two ancillae 
are required, the hrst one compares the 6 -values between the 
first and the second components, and the second ancilla the 
6 -values between the second and the third components. 

The two ancillae are initialised in the same known state qa- 
Let the component 6 -values be 60 , 61 , 62 ; the ancilla output 
states after termination will be qai = ga + ( 6 ai© 6 o© 6 i)-|-pai, 
qa 2 = qa + ( 6 a 2 © 6 i © 62 ) + Pa 2 . Considering that initially 
the ancillae are not faulty, the fault detector will extract two 
bits of information si = 6 o © 6 i and S 2 = 6 i © 62 , indicating 
how the 6 -values compare pairwise between the components. 
The extension to faulty ancillae is presented in Section [5.81 

The syndrome bits si and S 2 encode the index of the faulty 
component process. For si = S 2 = 0 no faulty component 
exists, and the fault detector sets its faulty flag to false. For 
all the other syndrome values, the detector sets faulty=true, 
and the faulty component index is computed by pos = S 2 * 

2 + Si — 1. 

5.6 Fault corrector 

The fault corrector communicates with all the fault dectors 
in the system, and has a global overview of all the faults 
that were detected during the execution of the distributed 
computation. The global perspective has the following ad¬ 
vantage: the corrector can observe if the modelled r failure 
rates are valid or not; is the modelled failure rate to low? 

In the presence of faults (signalled by the detectors), the 


fault corrector has two options: to either correct the faults, 
or to try and track their effect throught the distributed al¬ 
gorithm. The direct correction could introduce failures, and 
for this reason fault-tracking is more advantageous. Fault¬ 
tracking is performed based on commutativity properties: 
it is known how faults are transformed by both local and 
global operations. Hence, corrections are required only after 
the distributed computation was terminated and the output 
states were read out from the distributed system. 

5.7 Transversality 

The transversal application of a logical operation (local or 
distributed) is its decomposition into (local or distributed) 
operations applied on the component processes. For exam¬ 
ple, the logical local operation Gi is the n-fold application 
of G on each of the n components. 

Faults are propagated by inter-process communication. In 
this section, propagation is illustrated by a distributed sys¬ 
tem with two logical processes, each constructed from three 
component processes. Propagation will be mitigated by 
transversal communication. 

Let qf be the logical state of the logical control, and qj the 
state of the logical target. It is further assumed that in both 
logical processes (control and target) at most one component 
has its fe-value set to heads. Once more, it should be noted 
that the processes are not aware of their values. The logical 
states were transformed by transversal logical operations re¬ 
sulting in three equivalent component states in each logical 
process: gg = gj = gg and gg = gj = gg. 

As the component states in both control and target are 
equivalent, transversal inter-process communication is im¬ 
plemented by forming pairs between control and target com¬ 
ponents. There are two possibilities: 1) the same control 
component is paired with each target component (see Fig¬ 
ure ??); 2) each control component is paired with a different 
target component (see Figure ??). The second scenario cor¬ 
responds to transversal inter-process communication. 

For the first scenario, assuming that the component process 
Oc (the component indexed 0 in control) is used, pairs of the 
following form {control, target) are built: (0c,0t), (Oc, Ic) 
and (Oc, 2t). The three communication steps result in the up¬ 
dated 6-values of the target components (feij = bi^ ©feoo)- As¬ 
suming 6oo is heads, the control fault was propagated to the 
target components. At this point it cannot be guaranteed 
that the logical target is protected against a component’s 
single heads 6-value. Furthermore, the total cost of commu¬ 
nication between the logical process is ci = 3 x cost{heads). 

The second scenario, the transversal inter-process communi¬ 
cation, could result in the following three pairs being formed: 
(Oc, Ot), (Ic, It), (2c, 2t). Maintaining the assumption of the 
component Oc being faulty in the 6-value, after the three 
communication rounds only the state of Ot would be nega¬ 
tively affected (6ot = 6ot © 6oc). As a result, after executing 
this communication scenario, it can be guaranteed that the 
logical target is further protected against a single heads value 
of 6. The total cost of communication between the logical 
process is C 2 < lx cost {heads). Transversality minimises the 
communication cost between the logical processes, because 


Cl > C2. 

5.8 Computational redundancy 

Transversality is the key to constructing fault tolerant op¬ 
erations on logical processes, but is not applicable for main¬ 
taining a consistent set of component processes. A different 
technique has to be devised. In the presentation of the pro¬ 
cess model it was mentioned that faults can occur any time: 
the black-box coin is tossed at random time points. The 
toss could happen before each local or distributed opera¬ 
tion. Faults are also allowed to occur before a process is 
terminated: after the last operation, but before returning 
its final state. 

The solution is to continously check and correct every logical 
process in the system. Checking is performed by the fault 
detectors and corrections are applied by the fault corrector. 

A detection round consists of multiple epochs, but a logical 
process is continously checked (multiple rounds). 

Section [53] introduced the fault detector and its use of an- 
cillae, but ancillae were considered correct. In the presence 
of faults, the syndrome bits could be incorrect and trig¬ 
ger an unnecessary correction that would introduce more 
faults. The solution is to repeat during a detection round 
the syndrome extraction procedure multiple times, similar 
to a sequence of epochs (an example of computational re¬ 
dundancy). Every epoch requires a new pair of ancillae, 
and the fault detector will perform majority voting between 
the three pairs of extracted syndrome bits. 

A freshly initialised process that was detected as being faulty 
could be either corrected or directly terminated and a new 
process instance would need to be initialised. During the 
execution of a process (between local and distributed oper¬ 
ations), correction is the only option. Process operations 
could be delayed by the complete detection and correction 
procedures, because the detectors requires multiple execu¬ 
tions in order to achieve a probabilistically consistent de¬ 
cision. Thus, fault-tolerance introduces signihcant resource 
and computational overheads. 

6. CONCLUSIONS 
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